Discovering Non-binary Hierarchical Structures with Bayesian Rose Trees
نویسندگان
چکیده
Rich hierarchical structures are common across many disciplines, making the discovery of hierarchies a fundamental exploratory data analysis and unsupervised learning problem. Applications with natural hierarchical structure include topic hierarchies in text (Blei et al. 2010), phylogenies in evolutionary biology (Felsenstein 2003), hierarchical community structures in social networks (Girvan and Newman 2002), and psychological taxonomies (Rosch et al. 1976). A large variety of models and algorithms for discovering hierarchical structures have been proposed. These range from the traditional linkage algorithms based on distance metrics between data items (Duda and Hart 1973), to maximum parsimony and maximum likelihood methods in phylogenetics (Felsenstein 2003), to fully Bayesian approaches that compute posterior distributions over hierarchical structures (e.g. Neal 2003). We will review some of these in Section 1.2. A common feature of many of these methods is that their hypothesis spaces are restricted to binary trees, where each internal node in the hierarchical structure has exactly two children. This restriction is reasonable under certain circumstances, and is a natural output of the popular agglomerative approaches to discovering hierarchies, where each step involves the merger of two clusters of data items into one. However, we believe that there are good reasons why restricting to binary trees is often undesirable. Firstly, we simply do not believe that many hierarchies in real world applications are binary trees. Secondly, limiting the hypothesis space to binary trees often forces spurious structure to be “hallucinated” even if this structure
منابع مشابه
Bayesian Rose Trees
Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bay...
متن کاملBinary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
Discovering hierarchical regularities in data is a key problem in interacting with large datasets, modeling cognition, and encoding knowledge. A previous Bayesian solution—Kingman’s coalescent—provides a probabilistic model for data represented as a binary tree. Unfortunately, this is inappropriate for data better described by bushier trees. We generalize an existing belief propagation framewor...
متن کاملA New Heuristic Algorithm for Drawing Binary Trees within Arbitrary Polygons Based on Center of Gravity
Graphs have enormous usage in software engineering, network and electrical engineering. In fact graphs drawing is a geometrically representation of information. Among graphs, trees are concentrated because of their ability in hierarchical extension as well as processing VLSI circuit. Many algorithms have been proposed for drawing binary trees within polygons. However these algorithms generate b...
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS Design and Implementation of General Tree Skeletons
Trees are important datatypes that are often used in representing structured data such as XML. Though trees are widely used in sequential programming, it is hard to write efficient parallel programs manipulating trees of arbitrary shapes, because of their irregular and ill-balanced structures. In this paper, we propose a solution for them based on the skeletal approach, in particular for genera...
متن کاملThe Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data
The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...
متن کامل